At the United Nations' event celebrating the International Day of Happiness on March 20, 2012, the first World Happiness Report was released. TED talks, conferences, fields of study, and decades of research are dedicated to the question "What makes us happy?" The report aimed to answer that question through "happiness indicators". The researchers behind the report used data from the Gallup World Poll and answers to the Cantril ladder question (https://news.gallup.com/poll/122453/understanding-gallup-uses-cantril-scale.aspx), which are measurements of a respondent's imagined best (10) to worst (0) possible lives and where their lives fall on that scale. This score is broken into six factors: "economic production, social support, life expectancy, freedom, absence of corruption, and generosity".
The data was downloaded from https://worldhappiness.report/ed/2021/#appendices-and-data under the Data Panel and Mortality Data links. The Data Panel includes various years of happiness score data. In this analysis, we will examine transitions from 2018-2020. The Mortality Data includes further country metrics, some of which we use as confounders, and death metrics, including those related to covid. To download the pdf version of this document here.
I originally obtained this section of data from https://www.kaggle.com/unsdsn/world-happiness. I imported the data using fread() and used summary() to search for any missing values. The only missing value was in year1's Perceptions of corruption. I replaced that NA with year2's value, which is acceptable because a change (delta) of zero from year to year will not affect my question. However, it will allow me to use the other values for that location/row. Strangely, the same problem column was a different datatype in both years (char and dbl), so I cast them as doubles. After, I was able to merge the datasets and order them by Overall rank. Another category of missing values were countries that were specific subsets of scores. However, often these values were missing for both years under examination, so they were addressed in the same way.
I examined the dimensions and base R summary statisics of the years dataset. Notably with a score of 7.632 in 2018, Finland has the number one spot for both years, and their perceived healthy life expectancy went up by ~.1 while their perceived freedom went down by ~.1. Also, Norway and Denmark swapped positions for 2nd and 3rd rank. On the opposite end of the spectrum, Burundi and South Sudan have the lowest happiness score at 2.905 and 2.853 respectively. The lowest six countries report subscores all below 1 point, and the Central African Republic has scores of 0 for social support for both years. One of the few other 0's was Afghanistan for freedom to make life choices.
The corrplot correlation matrix revealed the relationships between the subscores. Seeing these relationships are important for other questions and seeing the "status quo" of happiness indicators. That said, GDP per capita and Healthy Life expectancy were most highly correlated. I also looked at the shifting distributions of the subscores through histograms. Visually, the main changes were an increase in perceived corruption for the first quartile of countries (which means a lower score) and an increase in all countries' perceptions of Healthy life Expectancy. This is also reflected in their summary statistics. In boxplots, GDP seemed to increase, but only because the scale shifted. This was due to the absence of an outlier that was present in year1 but not year2- the United Arab Emirates' GDP score dropped from 2.096 to 1.684. This might indicate a change in how much happiness GDP per capita gives the citizens. Finally, the freedom subscore decreases and the perception of corruption score has the most outliers due to a majority of low values.
## {-}
The first comparison scatter plot of healthy life expectancy doesn't show much change between years. The second scatter plot shows high dispersion within the Generosity vs Score plot. Each strata is clustered tightly within their y-value bands, but there is more variability in the x-axis.
Obviously many countries are missing death values. This may be due to difficulty with data collection in those regions or misreporting. This map shows the magnitude of the impact of 2020 (primarily covid) per capita. The map also has hover functionality to show happiness scores.
This map color codes by life ladder happiness scores, while hovering to provide excess death numbers. This allows the user to examine relationships between the two.
Here we see which confounders affect the relaitonship between happiness subscores and excess deaths. The findings are available in the table tab, as well as in the format of regression plots.
For my first question: Based on the combined scatter/line plots and factorized years, the most significant changes between years occur from increases in Healthy life expectancy and decreases in Freedom to make choices. These increases can be seen through left and right translations on the x-axis. It's likely these changes aren't as visible on the 'Score' y-axis because the changes (approximately ~.1 each) cancel out when they contribute to the overall score. These findings agree with and expand upon with the EDA histogram and boxplot findings. The addition of the scatter/line plot makes the trends more visible.
For my second question: All subscores against excess deaths have high dispersion, including with the two confounders. In fact, a linear relationship is likely not an accurate relationship for these variables. Also, the confounders have marginal effects on subscore significance. Most important, there are no subscores such that the confounders of median age and index of institutional trust cause a crossing of the p=.05 threshold. Therefore, we establish these confounders do not have a notable influence on the subscore to death relationship.
These plots show more visually salient overlays of how scores shift year over year. The regression lines indicate strata trends The changes for the plots particularly pronounced in healthy life expectancy and freedom to make life choices.
Of course, many complex factors contribute to national and individual happiness levels. They range from political and economic stability to social status and more. The Happiness Report provide provides a key benchmark for our understanding of happiness around the world.
Over 2018 to 2019, there was not much change in happiness levels or their subscores. Most change was on the order of magnitude of 10^-1 for subscores. As stated before, the key changes were increase in Healthy life expectancy and decreases in Freedom scores. These findings and any other changes in happiness level, beg the question of why. What social, local, or global effects cause happiness levels to change or stay the same?
Over 2019 to 2020, excess deaths are used in this report as a surrogate for several items. One is reflective of populations' obedience to federal safety guidelines around Covid-19. We don't unfortunately account for all epidemiological elements of disease spread in measuring confounders. Specifically, we focus on country population median age to measure the role of disease susceptability in populations and an index of institutional trust as one component of likelihood of obedience to safety guidelines. Overall, we found the confounders did not impact the relationship of happiness subscores on excess deaths. We found that generosity (p=.002) and perceptions of corruption were most significant (p=.005). Least significant was freedom to make life choices, while all others are significant. Therefore, we ascertain that happiness on the life ladder and happiness subscores have an effect on excess deaths in 2020. They, in fact, have a protective effect against excess deaths: a happier, wealthier, more supported, healthier, more generous, less corrupt society keeps more citizens alive year over year.
Social support